Slipstream-based Steering for Clustered Microarchitectures Slipstream-based Steering for Clustered Microarchitectures Acknowledgement

نویسندگان

  • Eric Rotenberg
  • Nikhil Gupta
چکیده

To harvest increasing levels of ILP while maintaining a fast clock, clustered microarchitectures have been proposed. However, the fast clock enabled by clustering comes at the cost of multiple cycles to communicate values among clusters. A chief performance limiter of a clustered microarchitecture is inter-cluster communication between instructions. Specifically, inter-cluster communication between critical-path instructions is the most harmful. The slipstream paradigm identifies critical-path instructions in the form of effectual instructions. We propose eliminating virtually all inter-cluster communication among effectual instructions, simply by ensuring that the entire effectual component of the program executes within a cluster. This thesis proposes two execution models: the replication model and the dedicated-cluster model. In the replication model, a copy of the effectual component is executed on each of the clusters and the ineffectual instructions are shared among the clusters. In the dedicated-cluster model, the effectual component is executed on a single cluster (the effectual cluster), while all ineffectual instructions are steered to the remaining clusters. Outcomes of ineffectual instructions are not needed (in hindsight), hence their execution can be exposed to inter-cluster communication latency without significantly impacting overall performance. IPC of the replication model on dual clusters and quad clusters is virtually independent of inter-cluster communication latency. IPC decreases by 1.3% and 0.8%, on average, for a dual-cluster and quad-cluster microarchitecture, respectively, when inter-cluster communication latency increases from 2 cycles to 16 cycles. In contrast, IPC of the best-performing dependence-based steering decreases by 35% and 55%, on average, for a dual-cluster and quad-cluster microarchitecture, respectively, over the same latency range. For dual clusters and quad clusters with low latencies (fewer than 8 cycles), slipstream-based steering underperforms conventional steering because improved latency tolerance is outweighed by higher contention for execution bandwidth within clusters. However, the balance shifts at higher latencies. For a dual-cluster microarchitecture, dedicated-cluster-based steering outperforms the best conventional steering on average by 10% and 24% at 8 and 16 cycles, respectively. For a quad-cluster microarchitecture, replication-based steering outperforms the best conventional steering on average by 10% and 32% at 8 and 16 cycles, respectively. Slipstream-based steering desensitizes the IPC performance of a clustered microarchitecture to tens of cycles of inter-cluster communication latency. As feature sizes shrink, it will take multiple cycles to propagate signals across the processor chip. For a clustered microarchitecture, this implies that with further scaling of feature size, the inter-cluster communication latency will increase to the point where microarchitects must manage …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intra-level Incomplete Bypassing: Achieving Performance and Power Efficiency

Villasenor, Eric P. M.S.E.C.E., Purdue University, December, 2007. Intra-level Incomplete Bypassing: Achieving Performance and Power Efficiency . Major Professor: Mithuna S. Thottethodi. Researchers have proposed clustered microarchitectures to capture the benefits of high performance and high energy efficiency. Typically, clustered microarchitectures offer fast local bypasses (i.e., value forw...

متن کامل

Extending OpenMP to Support Slipstream Execution Mode

OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalability of applications based on this standard. This paper investigates the implementation of an OpenMP compiler that supports slipstream execution mode, a new optimization mechanism for CMP-based distributed shared memory...

متن کامل

An Unknown Input Observer for Fault Detection Based on Sliding Mode Observer in Electrical Steering Assist Systems

Steering assist system controls the force transfer behavior of the steering system and improves the steering probability of the vehicle. Moreover, it is an interface between the diver and vehicle. Fault detection in electrical assisted steering systems is a challenging problem due to frequently use of these systems. This paper addresses the fault detection and reconstruction in automotive elect...

متن کامل

Uniied Cluster Assignment and Instruction Scheduling for Clustered Vliw Microarchitectures

There has been a trend towards microarchitectures that have disjoint register les to reduce the register le access time. The register le is partitioned and a set of functional units is assigned to each partitioned register le. The partitioned register le and its set of functional units constitute a cluster. Instruction scheduling for a clustered microprocessor requires assignment and scheduling...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003